Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are some mistakes in the first version of the problem fix (#12), although I indeed got better final results.
Something to be corrected:
reward_loss
is indeed correct and it should not be modified.Main changes:
returns
in main.py is modified to be the same as theone at line 187 in dreamer.py. of Dreamer (tensorflow2 implementation).
The factors contributing to the issue #4 :
@letusfly85 and @coderlemon17 found that their training results are not as good as the testing ones of the origin paper (around 700 at 1M steps). This is reasonable because noise is added at action during training while the noise is removed when testing. Here are the training and testing results of the origin dreamer-pytoch (i.e., the version before applying my fixes) in walker-run env, where epoch 1000 stands for 1M steps.
data:image/s3,"s3://crabby-images/26eaf/26eaf52cb5a5cbb961f06f163c941389ee233249" alt="ori_train"
data:image/s3,"s3://crabby-images/47f33/47f335c70d904a57b79c657d70338b5a87bb1fa3" alt="ori_test"
Planning horizon is another factor. Here is the result of the dreamer-pytorch + Fix 2, where the testing return is around 700 at epoch 1000 (1M steps).
The testing result of dreamer-pytorch + Fix 1&2 is almost the same as that of dreamer-pytorch + Fix 2:
data:image/s3,"s3://crabby-images/39147/391471345d0ccf9cb3407bfee8091428ce6d79bb" alt="fix_test"
However, their value losses are different, where the value loss of dreamer-pytorch + Fix 1&2 is lower than that of dreamer-pytorch + Fix 1: